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We demonstrate how to generalize two of the most well-known random 
graph models, the classic random graph, and random graphs with a given 
degree distribution, by the introduction of hidden variables in the form of 
extra degrees of freedom, color, applied to vertices or stubs (half-edges). 
The color is assumed unobservable, but is allowed to affect edge probabil- 
ities. This serves as a convenient method to define very general classes of 
models within a common unifying formalism, and allowing for a non-trivial 
edge correlation structure. 

PACS numbers: 02.50.-r,64.60.-i, 89.75.Fb 

1. Introduction 

The availability of data on real-world networks, e.g. from information 
technology and molecular biology, has seen a dramatic increase in the last 
decades. This has led to a correspondingly increased interest in the theo- 
retical modelling of networks. 

Typically the growth of a real-world network is not entirely deterministic 
but contains stochastic elements, and statistical models are required that 
are conveniently formulated in terms of ensembles of graphs. Typical real- 
world networks are not static but change with time, and much of the focus 
has been on dynamical models, where one attempts to describe the growth 
and evolution of a network. 

Here we will focus on static random graph models, describing a snapshot 
of a network in terms of a fixed ensemble of graphs, without regard to how 
the network was formed. By a random graph we will mean a member of such 
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an ensemble. In particular, we will be mostly interested in sparse random 
graphs, where the typical vertex degree does not grow with the size of the 
graph. 

There is a vast spectrum of such models around. Some of these are not 
entirely random, in the sense that they are based on an underlying regular 
network - i.e. a lattice - which is then modified in a random fashion. 

Our focus will be on purely random graphs, where such an underlying 
regularity is absent. A number of more or less unrelated models of this type 
have been investigated, and it would obviously be desirable to devise a uni- 
fied description in terms of a general class of ensembles, where more special- 
ized models appear as special cases of one and the same general formalism, 
while maintaining the computability of local and global graph characteris- 
tics of interest, such as degree distributions, small subgraph abundancies, 
component size distributions, and global connectivity properties. 

The most well-known purely random model is the classic Random Graph 
of Erdos and Renyi [1], to be referred to as RG. In its sparse version it 
is defined as follows. For a given set of N nodes, every pair of nodes is 
connected by an edge independently with probability p = c/N in terms of 
a given parameter c that asymptotically defines the average degree. This 
model has many interesting properties, such as an asymptotically Poissonian 
degree distribution, and a phase transition at c = 1, above which a giant 
connected component is formed. However, it fails to describe most real- 
world networks. 

A more general model that has been much studied is Random Graphs 
with a given Degree Distribution [2, 3, 4, 5], or Degree-driven Random 
Graphs (DRG), where an asymptotic degree distribution is given, suitably 
transformed into a definite degree sequence for a given graph size. In terms 
of this a random graph is defined as a uniformly random member drawn 
from the set of graphs having the given degree sequence, possibly subject 
to additional constraints (e.g. by demanding the graph to be simple, i.e. 
non-degenerate). DRG models suffer from an intrinsic lack of edge correla- 
tions, atypical of real- world networks; as a result, they are often referred to 
as uncor related random graphs. 

In a sequence of papers [6, 7, 8], I have explored the use of hidden 
coloring, either of vertices or of stubs, to define more general random graph 
models. The resulting models can be seen as colored extensions of RG and 
DRG. As will be shown in this paper, the hidden color provides a convenient 
means for defining very general classes of random graph models, where much 
of the limitations of the uncolored models can be done away with, while the 
computability of interesting properties is maintained. 

The considered classes of models will be compared with respect to a 
few basic properties: The degree distribution, the abundancy of arbitrarily 
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given small subgraphs, the size distribution of connected components, and 
the phase transition where a giant component appears. 

The plan of the paper is as follows. In section 2, we will discuss a few 
fundamental concepts needed in the subsequent sections. In section 3, we 
will review the definitions of the models to be considered. A comparative 
analysis of a selected set of characteristics, as derived in the different model 
classes, is presented in section 4. Section 5, finally, contains a concluding 
discussion. 

2. Basic Concepts and Methods 

All models to be considered in this paper will be of sparse random 
graphs, where the degrees (connectivities) of vertices stay finite as the size 
N of the graph grows to infinity. In particular, this means that the total 
number of edges will scale as N, and that the probability of a connection 
between an arbitrary pair of nodes will scale as 1/N. 

A simple local characteristic of a graph ensemble is its degree distribu- 
tion, {pm}- This is often conveniently described in terms of its generating 
function, 

H{x)=Y / PmX m . (1) 

m 

It obviously satisfies H(l) = 1, and yields upon repeated differentiation at 
x = 1 successive combinatorial moments of the degree, 

H'{1) = (m) , H"(l) = {m{m - 1)) , H'"(l) = {m{m - l)(m - 2)) , (2) 

etc., while the individual p m can be obtained by repeated differentiation at 
x = 0. 

Generating functions of this type are convenient when analyzing a prob- 
ability distribution of an integer variable k that is the sum of several 
independent contributions, k = J2i^i, in which case the generating func- 
tion f(z) = J2k Pk% k for the distribution of k is simply the product of the 
corresponding generating functions for the distribution of each contribution. 

3. The Models 

Here follows a brief introduction to the models to be considered. 

3.1. The Classic Model - RG 

The classic random graph (RG) [1] has been thoroughly analyzed over 
the years [9, 10]. It is a model of simple (non-degenerate) labelled graphs 
with a given set of iV nodes, although it can be easily extended to include 
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also non-simple graphs [11]. We will consider its sparse version. It comes 
in two essentially equivalent versions, one with a fixed number of edges, the 
other with a fixed probability for each possible edge; we will stick to the 
latter. 

Sparse RG has a single real parameter c > controlling the abundance 
of edges. For a given graph size N and a given value of c, an ensemble 
of graphs is defined as follows. Each of the N(N — l)/2 pairs of distinct 
nodes independently is connected by an undirected edge with a common 
probability p = c/N (assuming N > c). 

3.2. Inhomogeneous Random Graphs - IRG 

The classic RG model as described above can by generalized in a straight- 
forward way by assigning color to vertices and allowing edge probabilities to 
be color-sensitive; the resulting class of models will be referred to as IRG, 
for inhomogeneous random graphs [6]. 

A definite IRG model is specified in terms of 

• a color space, taken as [1, . . . , K]; 

• a color distribution {r a > 0, a = 1, . . . , K}, with J2 a r a = 1; 

• a real, symmetric color preference matrix c = {c a b > 0}. 
For a given graph size N, such a model is implemented as follows. 

1. Assign to each node independently a random color a, drawn from the 
given distribution {r a }. 

2. Connect each pair of distinct nodes independently with probability 
c a b/N, where a and b are the respective colors of the two nodes. 

By considering the color as unobservable or hidden, the resulting ensemble 
of colored graphs yields a specific ensemble of plain graphs, distinct from 
an RG ensemble, as will be shown below. The role of the hidden color is to 
enable non-trivial edge correlations. IRG defines a class of graph ensembles 
much more general than RG. 

3.3. Random graphs with a given degree distribution - DRG 

The classic RG model is limited to a Poissonian degree distribution. 
A more general class of models that has recently attracted the attention of 
several workers in the statistical physics community is random graphs with a 
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given degree distribution [2, 4, 5], to be referred to as DRG (for degree-driven 
random graphs). 1 This approach allows for an arbitrary degree distribution. 

There are two common variants of DRG. One is given by restricting 
the ensemble to simple (non-degenerate) graphs, where self-couplings and 
multiple connections are banned. In the other (the configuration model) 
one allows for degenerate graphs. For ease of analysis, we will focus on the 
latter version where degeneracies are allowed. 

Some notation: A node with degree m is considered to possess m stubs, 
each of which defines a point of attachment for an edge endpoint. The total 
number of stubs M = J2i m i i n a graph obviously must be even, being equal 
to the total number of edge endpoints, i.e. twice the number of edges. 

A specific DRG model is defined by specifying an arbitrary degree dis- 
tribution {p m }. For a fixed graph size N, 2 the corresponding ensemble is 
implemented as follows. 

1. For each node, draw its degree independently from the given distribu- 
tion. Redo until the total sum of the degrees is even. 

2. Define random edges by performing a completely random pairing within 
the resulting even-numbered set of M stubs. 

This leads to an ensemble of pseudographs, where degeneracies may appear, 
in the form of self-connections (tadpoles) or multiple edges between the same 
pair of nodes. 

The random stub pairing is reminiscent of the combinatorics associated 
with Gaussian integrals; indeed, a relation exists between DRG models and 
certain miniature field theories [12, 13]. 

3.4. DRG plus color - CDRG 

Also the DRG class of models can be generalized, by utilizing a coloring 
of stubs, which turns out to be the most natural choice, and then allowing 
the stub pairing to be color-sensitive [7, 8]. The resulting very general class 
of models will be referred to as CDRG, for Colored DRG. 

With colored stubs, it is natural to consider the color-specific stub 
content of a node, its colored degree. With K colors to choose between, 
the colored degree is conveniently represented by an integer vector m = 
(mi,...,mjf), with the individual elements m a counting the number of 
stubs with color a. Obviously the plain degree m is obtained by summing 

1 Variants of this approach have been referred under various names, such as equilibrium 
random graphs and uncorrelated random graphs. 

2 We disregard impossible cases of an odd N with a degree distribution supporting 
only odd degrees. 
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up the elements of the colored degree, m = J2a m a = 1 • m, in terms of the 
uniform vector 1 = (1, . . . , 1). 

Then it is also natural to consider the probability distribution of such 
colored degrees, a colored degree distribution {p m }- Such a distribution can 
be represented by a multivariate generating function, H(x) = J2 m Pm^- m , 
where x m = Ila x a^> satisfying the normalizing condition H(l) = 1. From 
this, multivariate combinatorial moments can be derived by repeated dif- 
ferentiation at x = 1, e.g. d a H(x = 1) = (m a ), d a dbH(x = 1) = 
(m a mb — m a 5ab), etc. A specific CDRG model is defined by specifying 

• A color space, taken as [1 , . . . , K] ; 

• A colored degree distribution {p m }; 

• A real, symmetric color preference matrix T = {T a i> > 0}, such that 
T (m) = 1. 

We will for simplicity assume that the colored degree distribution is such 
that all moments are defined. 

For a given graph size N, such a model is implemented as follows. 

1. For each node, draw its colored degree independently from the given 
distribution. Redo until the total sum of the (plain) degrees is even. 

2. Define random edges by performing a weighted random pairing within 
the resulting even-numbered set of M stubs, such that the probability 
for each of the (M — 1)!! possible pairings has a statistical weight 
proportional to the product over all edges of a factor given by T a b, 
where a, b are the colors of the stubs it connects. 

This class of models obviously collapses to DRG for the case of a single 
color, in which case the matrix T collapses to a single number, given by 
1/ (to) by virtue of the constraint T (m) = 1. 

The constraint on T is convenient for the forthcoming analysis, and en- 
sures that the total number of a6-edges asymptotically approaches the value 
N (m a ) T a b (m&), which upon summing over b yields the correct asymptotic 
number of a-stubs as N (m a ). 

The combinatorics of the weighted random pairing yields the following 
asymptotic results. The probability that two arbitrary stubs with known 
colors a, b will be paired with each other is T a ^/N. This implies that the the 
probability for two random nodes with respective colored degrees m, m' will 
be connected is Ylab m aTab'm l b /N ', which reduces (as it must!) to (to) /N if 
the degrees are not known. 

A less general class of models, similar in spirit to CDRG but restricted 
to homogeneously colored vertices (i.e. with vertex coloring rather than 
stub coloring), has also been investigated [14]. 
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4. Analysis 

Below follows a comparative analysis, where we review a selected set of 
local and global characteristics for random graphs as drawn from models of 
the different types, with a focus on the asymptotic limit N — > oo. For a 
more detailed analysis, we refer to the paper [8] and references therein. 



4-1. Asymptotic Degree Distributions 

First we will derive the resulting asymptotic degree distributions, where 
not defined in the model specifications. 

4.1.1. RG degree distributions 

In a graph drawn from an RG model as described above, each node has 
N — 1 possible connections, each independently realized with probability 
c/N. Thus, the degree m of a random node will obey a binomial distribu- 
tion, ( N ~ 1 ^ (c/A) m (l - c/N) N ^- m . As N -» oo with fixed c, this 

approaches an asymptotic distribution {p m }, given by a Poissonian with 
average c, 

c m 

= e~ c — F , (3) 
ml 

with the corresponding generating function H(x) = e c<yX ~ l \ 

4.1.2. IRG degree distributions 

Choose a random node in a large graph from an IRG ensemble. It has the 
color a with probability r a . For large N there are ~ Ar& other nodes with 
color b; each of these is connected to the chosen node independently with 
probability c a b/N. Thus, for a node of given color a, we asymptotically 
expect its number of 6-neighbors to follow a Poissonian distribution with 
average c a brb, and its total degree to follow a Poissonian with average C a = 
J2b c ab r b- Averaging over a yields the asymptotic degree distribution 

Pm = Y, r * e ~ Ca ^T> ( 4 ) 
with the generating function 

H(x) = Y j r a e Ca{x ~ 1 \ (5) 

a 

which describes a Poissonian mix. This implies the following convexity 
constraint on the possible degree distributions: 

P m < — Pm-lPm+1, (6) 
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for m > 0. Conversely, any degree distribution obeying this constraint can, 
at least in principle, be realized with a suitable IRG model, possibly with 
infinitely many colors. 

4.1.3. DRG degree distributions 

The degree distribution is considered given in a DRG model, and so is 
in principle free to choose. For ease of analysis, we shall restrict our con- 
siderations to cases where all moments (m n ) exist, barring power-behaved 
distributions, which otherwise are interesting in their own right. 

4.1.4. CDRG degree distributions 

In a CDRG model, a colored degree distribution is given, from which the 
plain degree distribution can be extracted directly. Its generating function 
H(x) is obtained simply by evaluating the multivariate generating function 
(with the same name) for the colored degree distribution with a homoge- 
neous argument, H(x) = H(xl) = H(x, . . . , x). 

Since the colored degree distribution is free to choose, so is the plain 
one, and there are obviously many CDRG models with a given degree dis- 
tribution. 

4-2. Small Subgraph Statistics 

The combinatorial moments of the degree distribution are simply related 
to the expected numbers of subgraphs in the form av stars. More general 
local characteristics can be expressed in terms of the number of copies of 
an arbitrary small graph 7 found as subgraphs of a large random graph G. 
We will be interested in the expected number of copies in the asymptotic 
limit N — > 00. 

The clustering properties of a graph are often analyzed in terms of the 
probability of two neighbors of a node to be connected; this is seen to be 
related to the number of simple triangles, i.e. the number of subgraphs 7 
in the form of a mutually connected triple of nodes. 

Thus, assume an arbitrary small connected graph 7 to be given, having 
v <C N vertices and e <C N edges. We can estimate its expected number 
(n 7 ) of distinct occurrencies as a subgraph in a random graph G of size N 
as follows. 

A particular possible embedding of 7 in G is defined by mapping the 
ordered set of v nodes in 7 onto a target set given by an ordered -u-subset 
of the N vertices in G. There are N\/(N — v)\ ~ N v such sets. However, 
for a target set to define a valid subgraph position, each edge in 7 must be 
mapped onto an existing edge in the target set. 
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For the models considered, the expected count (n 7 ) can be derived from 
Feynman-like rules, with model-specific vertex and node factors as well as 
the usual symmetry factors. 

4.2.1. Small subgraphs in RG 

The RG model describes simple random graphs, which can have only 
simple subgraphs. For each of the ~ N v possible embeddings of a simple 7, 
the probability for the corresponding set of e target edges to exist is (c/N) e . 

Thus, naively, the expected number of occurrencies should be N v ~ e c e . 
If 7 has a non-trivial isomorphism group, i.e. a symmetry under some 
permutation of its vertices, the naive result has to be divided by the order 
Sj of the symmetry group. This leaves us with the following simple rules 
for the asymptotically expected subgraph count (n 7 ). 

• For each node in 7, associate a factor N. 

• For each edge in 7, associate a factor c/N. 

• Multiply the node and edge factors, and divide the result by the sym- 
metry factor 

Since 7 is assumed connected, we have e > v — 1, and e — v + 1 counts its 
number of loops. Thus, the expected count scales as 0(N) for a tree, and 
as O(l) for a one-loop 7, while it vanishes asymptotically for 7 with several 
loops. This is typical of a sparse random graph - loops are scarce. 

As an example illustrating the lack of correlations in an RG ensemble, 
consider subgraphs in the form of a f-chain, i.e. a set of v nodes connected 
in an open chain; the expected counts show a simple geometric behaviour, 
(n 7 ) = Nc v ' 1 /2. 

4.2.2. Small subgraphs in IRG 

Also in IRG, graphs are simple, so also here, we must assume 7 to be 
simple. Generalizing the arguments used for RG, we get the following rules 
for the asymptotically expected number (n 7 ). 

• Associate with each node in 7 an independent color a, and a corre- 
sponding factor Nr a . 

• Associate with each edge in 7 a factor c^/N, where a, b are the node 
colors at its endpoints. 

• Multiply all node and edge factors, sum over the node colors, and 
divide the result by the symmetry factor S* 7 . 
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Again, expected counts for tree subgraphs scale as O(N), and those for 
connected one-loop subgraphs as 0(1). 

Non-trivial edge correlations are possible in IRG, as illustrated by v- 
chain subgraphs, where the hidden color in an IRG ensemble enables the 
expected counts to deviate from the simple geometric behavior found for a 
plain RG ensemble; instead it takes the form of a mix of geometric sequences. 

4.2.3. Small subgraphs in DRG 

Since a DRG ensemble of the kind we are considering allows for degen- 
eracies, we will have to consider also possibly degenerate subgraphs, with 
loops of length one or two. Since subgraphs with loops are suppressed due 
to the sparsity, just as in RG and IRG, degeneracies will turn out not to be 
very important. 

The expected number of copies of 7 with a fixed set of target nodes can 
be calulated as follows. Consider a node in the target set with actual degree 
m, that defines the target for a node with degree k in 7. The corresponding 
k target edges can be chosen among the m existing ones in m& = m\/(m—k)\ 
distinct ways. This can be shown to yield the following rules for calculating 
the asymptotic (n 7 ) for a DRG model. 

• Associate with each node with k stubs in 7 a factor N (m^). 

• Associate with each edge in 7 a factor 1/ ( iV (m) ) . 

• Multiply the node and edge factors, and divide the result by the sym- 
metry factor S-y, including possible contributions from edge permuta- 
tions and flips for the case of a non-simple 7. 

Here, {m^) stands for the fcth combinatorial moment, defined by d^H(z = 
1) = (m(m — 1) . . . (m — k + 1)) (see eq. (2)). 

For a -u-chain, the expected count becomes N (m) 3 ~ v {m(m — l)) v ~~ 2 /2, 
displaying simple geometric behavior just as for the case of RG, illustrating 
the lack of edge correlations in DRG. 

4.2.4. Small subgraphs in CDRG 

To analyze the subgraph statistics for a CDRG model, we will need the 
colored generalizations of the combinatorial moments, 

E abc ... = d a d b d c ...H(z = l), (7) 

with the lowest ones given by E a = (m a ), E a h = (m a mb — m a 5 a b) , etc. 
Generalizing the argument used for DRG, one can derive the following rules 
[8] for the asymptotically expected subgraph counts in a CDRG model. 
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• Associate with each stub in 7 an independent color label. 

• Associate with each node in 7 having k stubs a factor NE^ where 
a,b,c. . . are the k associated color labels. 

• Associate with each edge in 7 a factor T a b/N, where a, b are the color 
labels associated with the stubs at its endpoints. 

• Multiply the node and edge factors, sum over the associated colors, 
and divide the result by the symmetry factor S" 7 , including possible 
contributions from edge permutations and flips for the case of a non- 
simple 7. 

Note how these reduce to the DRG rules for the case of a single color. 

For the case of a -u-chain, the expected count becomes a mix of geometric 
sequences, showing how the coloring also for CDRG enables non-trivial edge 
correlations, just as was the case for IRG. 



Next we turn to an analysis of the global connectivity characteristics of 
a random graph. These are simplest described in terms of the sizes of the 
connected components of the graph. 

Thus, we will be interested in the size distribution P n of a connected 
component of a random graph, as revealed from a randomly chosen initial 
node by recursively exploring edges leading to new nodes until the entire 
component is revealed. For sparse random graphs in the asymptotic limit 
N — > 00, any finite component is almost surely a tree, since any extra 
connections will be suppressed by factors of l/N. Thus, the revelation of 
such a component can be described as a branching process, with properties 
depending on the specific model considered. 

The asymptotic component size distribution is conveniently analyzed in 
terms of its generating function, 



For the models considered, g(z) or a set of related functions will satisfy 
recursive equations that determine the sought distribution. 

4.3.1. Component sizes in RG 

For an RG model, the component size distribution can be estimated as 
follows, as long as the component remains small. For each revealed node 
i, the number k of branches to new nodes obeys a Poissonian distribution 



4-3. Connected Component Sizes 




(8) 



n 
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asymptotically, pk ~ e c c k /kl, since there are ~ N remaining unrevealed 
nodes, each of which connects to i with probability c/N. 

This yields the recursive equation g(z) = ze~ c ^Z k c k g(z) k /k\, to be un- 
derstood as follows. The initial factor of z accounts for the initial node, 
while each term in the sum describes the case where it has a distinct num- 
ber k of neighbors. The factor e~ c c k /k\ represents the probability for this 
case, and the factor g(z) k encodes the fact that each of the k neighbors 
defines a subtree statistically identical to the full tree. 

The recursion can be simplified to read 

g(z) = ze c ^- 1 \ (9) 

which should be interpreted as an iterated map for the value of g for a given 
value of z, a stable fixed point of which defines the physical value. 

As a curiosity, eq. (9) can be written as F{cg{z)) = zF(c), with 
F(c) = ce~ c , with the explicit solution g(z) = F~ 1 (zF(c))/c, with the 
inverse of F defined from the restriction F(c), \c\ < 1. Taylor-expanding the 

( -c\ n 

inverse yields the exact solution P n = v cnn! for n > 1 for the asymptotic 
component size distribution, with the large-n behaviour P n — > l^ cn !s/2 > 
decaying exponentially for c 7^ 1 => ce 1 ~ c < 1, but only as a power for 
c = 1. 



4.3.2. Component sizes in IRG 



For an IRG model, the asymptotic component size distribution, and 
thus its generating function g(z), will obviously be an average over the 
result conditional upon a particular color of the initial node, and we can 
write 

(10) 



9( z ) =J2 r ^a(z), 



where g a (z) is conditional upon initial color a; these satisfy the following 
recursive relations. 



g a (z) = zexp 



J2 Cabrb (9b(z) - 1) 
. b 



an obvious generalization of the corresponding RG result, eq. (9). This 
cannot in general be solved exactly, but can be analyzed using numerical 
and/or series expansion methods. 
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4.3.3. Component sizes in DRG 

Next we wish to obtain the asymptotic size distribution {P n } in a DRG 
model. As before, the sparsity forces finite components to take the form of 
tree. 

The generating function H(x) for the degree distribution will turn out to 
be convenient. Of interest is also the degree distribution of a node reached by 
following a random edge. This yields a weighting of nodes by their degree, 
resulting in the modified distribution q m = mp m / (m). The generating 
function for its remaining degree (disregarding the incoming stub) becomes 

//':■<•; //'^- 

With g{z) as before the generating function for the size distribution of 
the entire component, let h(z) be the analogous generating function for the 
size distribution of a subtree found by following an edge. Then g(z) can be 
expressed in terms of h(z) as 

g(z) = zJ2Pmh(z) m = zH(h(z)), (12) 

m 

to be interpreted as follows. The explicit factor of z represents the first 
node. It has m outgoing edges with probability p m , each of which represents 
a subtree and yields a factor h(z); see fig. 1 for a graphical illustration. By 




Fig. 1. g{z) in terms of h(z) for a DRG model, illustrating eq. (12). 



a similar argument, h(z) satisfies the recursion 

M^mp m x H'(h(z)) 

*(*) = *E-M*(*) =^(iT' (13) 

as depicted in fig. 2. Note that for the case of a Poissonian degree distri- 




Fig. 2. Illustration of the recursive relation (13) for h(z). 



bution, the recursion simplifies to the RG result. Indeed, the Poissonian 
restriction of DRG is asymptotically equivalent to a version of RG allowing 
for non-simple graphs. 
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4.3.4. Component sizes in CDRG 

Finally, we wish to obtain the asymptotic size distribution {P n }, and its 
associated generating function g(z), in a CDRG model. 

Here, the multivariate generating function H(x) for the colored degree 
distribution will be needed. Of interest is also the colored degree distribution 
Qm\a °f a node reached by following a random edge emanating from a stub 
of given color a. This is given by g m | a = Y^b^ab^bPrn- It follows that 
the generating function for the distribution of its remaining colored degree 
(where the incoming stub is neglected) is J2b^abdbH(x.). 

With g(z) having its usual meaning, we will denote by h a (z) the anal- 
ogous generating function for the size distribution of a subtree found by 
following an edge emanating from a stub of color a. Then, generalizing eq. 
(12), g(z) can be expressed in terms of h(z) = (hi(z),... , hxiz)) as 

g(z) =^ft„nM«r = *H(h(z)), (14) 

m a 

with the following interpretation. The explicit factor of z accounts for the 
first node, which has a colored degree m with probability p m ; each stub 
of color a represents a subtree and yields a factor h a (z). The argument 
generalizes the one used for DRG, as depicted in fig. 1. 

By a similar argument, h(z) satisfies the coupled recursion 

h a (z) = zJ2Tabd b H(h(z), (15) 

b 

generalizing the DRG relation, eq. (13), depicted in fig. 2. 

4-4- The Appearence of the Giant 

For all models, g(z) is the generating function for the component size 
distribution {P n }, and normalization of probability requires g(l) = 1. In- 
deed, this corresponds to a fixed point of the recursions for z = 1 in all 
models. However, it is a physical solution only if it corresponds to a stable 
fixed point of the associated recursive equations. Where it fails to be sta- 
ble, a competing solution with g(l) < 1 will take over, yielding a probability 
deficit of magnitude 1 — g(l). 

For the asymptotically resulting branching process this can be inter- 
preted as being due to a finite probability to obtain an infinite tree. For a 
finite but large graph, it corresponds to the appearance of a giant compo- 
nent, asymptotically containing a finite fraction 1 — g(l) of the nodes, and 
the transition where the naive fixed point loses stability defines a percolation 
threshold, typically of second order. Below the threshold, all components are 
small, and above it there is a single giant, while the remaining components 
are small. 
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4.4.1. The giant in RG 

For z = 1, the recursion (9) for g = g(l) simplifies to g — ► e c ^ 9_1 \ with 
a solution satisfying cge~ cg = ce~ c . The stability of a solution depends on 
the magnitude of the Jacobian, given by ce c ^ 9 ~ 1 \ which equals eg when g 
is a solution. 

It has the trivial solution eg = c, i.e. g = 1. For c smaller than a critical 
value, c = 1, this solution is indeed stable under iteration of the recursion, 
with the Jacobian given by c. 

For c > 1, it fails to be stable, and so we must look for another fixed 
point as the physical solution. Indeed, such a fixed point exists, as follows 
from looking at a graph of the function c — > ce~ c , which has a unique 
maximum for c = 1. Thus, for each c > 1 there is a dual value c < 1 with 
the same value of this function, yielding the stable solution g(l) = c/c < 1. 

Thus, we have established a probability deficit for c > 1, reflecting the 
existence of a giant component, asymptotically containing a finite fraction 
1 — c/c of the nodes. The critical point c = 1 =>■ c = 1 defines the percolation 
threshold, above which there is a finite probability for an arbitrary pair of 
nodes to be connected via a finite path. 

4.4.2. The giant in IRG 

For an IRG model, g = g(l) is given by the linear combination g = rg = 
E a r a9a, with g = g(l) satifying the recursion g a -> exp[£ 6 c ab r b (g b -l)], as 
follows from setting z = 1 in eqs. (10,11). The stability of the trivial solution 
g = 1 depends on the spectrum of the local Jacobian matrix J = {c ab r b }. 

The case of the largest eigenvalue of J being exactly unity defines a 
critical hypersurface in parameter space, beyond which the trivial fixed point 
g(l) = 1 => g(l) = 1 loses stability, and a competing fixed point appears 
with <7 (1) < 1 <?(!) < 1- Again, the corresponding probability deficit 
1 — g(l) is taken as the probability for winding up in a giant component of 
size N(l-g(l)). 

4.4.3. The giant in DRG 

For a DRG model, setting z = 1 in eqs. (12,13), yields for g = g(l) and 
h = h(l) the relation g = H(h) and the recursion h — > H'(h)/H' (1), with 
the trivial fixed point /i = 1 =>■ g = 1. The stability of this is governed by 
the Jacobian H" (I) / H' (I) = (m(m — 1)) / (m). Stability results if this is 
smaller than unity, i.e. if {m(m — 2)) < 0, defining the subcritical domain 
of DRG [4]. 

In the supercritical domain, there will be a unique competing solution 
h < 1, satisfying hH'(l) = H'(h), yielding g < 1, with the corresponding 
probability deficit indicating the existence of a giant component. 
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4.4.4. The giant in CDRG 

Similarly, in a CDRG model, we can pinpoint the subcritical region by 
analyzing the stability of the trivial solution h(l) = 1 of the recursion (15) 
with z = 1, amounting to h — > TdH(h). The Jacobian amounts to J = TE, 
i.e. the matrix product of T and the matrix E = {E a t,} of second order 
multivariate combinatorial moments of the colored degree distribution, as 
defined in eq. (7), and subcriticality corresponds to the largest eigenvalue 
of J being smaller than unity. 

In the supercritical region, we will have non-trivial solution yielding 
g(l) < 1, with an associated probability deficit and a giant component of 
corresponding relative size. 

5. Discussion 

All of the models discussed in this article admit versions with or without 
the restriction to simple graphs. They share the existence of several nice 
properties, such as the computability of interesting local and global charac- 
teristics, and the existence of a phase transition in the form of a percolation 
threshold, where a giant component appears. 

The sparse RG model is a mathematically very interesting object. Nev- 
ertheless, it is severly limited as a model of real-world networks. Its degree 
distribution is restricted to be Poissonian, and it is suffers from a funda- 
mental lack of correlations between edges. Its main importance is as a role 
model for more general random graph models. 

The DRG approach yields a general class of random graph models, and 
contains a non-simple version of RG as a special case. Although it admits 
arbitrary degree distributions, it shares with RG a fundamental lack of edge 
correlations. 

Generalizing RG by adding hidden variables in the form of unobservable 
vertex colors, allowed to affect edge probabilities, yields another general 
class of models - IRG. It admits arbitrarily many distinct models for a 
single degree distribution, and displays non-trivial edge correlation. Its 
most serious limitation lies in the restriction of the degree distribution to 
a Poissonian mix (which however does not exclude power-behaviour!). It 
trivially contains RG as a special case, and its restriction to a rank one 
preference matrix, c a t = C a Ct, defines a class of uncorrelated models, that 
has been shown to be asymptotically equivalent to the restriction of DRG 
to Poissonian mixtures [6]. Thus, IRG and DRG define distinct superclasses 
of the classic RG model, and one might expect that there exists a larger 
class that contains them both as distinct restrictions. 

Such a unified class of models indeed exist. The generalization of DRG 
to models with unobservable color on individual stubs, that is allowed to 
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affect the edge probabilities as emerging from the stub pairing statistics, 
yields a very general class of models - CDRG. It allows for arbitrary degree 
distributions, as well as for non-trivial edge correlations. It contains as 
distinct subclasses both DRG (trivially) and IRG (as the restriction of the 
colored degree distribution to a mix of multivariate Poissonians) [7, 8]. 

CDRG shares with DRG an interesting relation to Feynman graphs of 
simple field theories; work is in progress to explore this relation. CDRG 
should also admit a straightforward extension to cover models also of di- 
rected graphs. 

A unifying formalism for random graphs appears to be a prerequisite for 
the possibility to devise a systematic model inference scheme based on the 
observed properties of real-world networks. CDRG appears to be a step on 
the way to such a formalism for sparse, truly random graphs. 
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